Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Multi-dimensional text clustering with user behavior characteristics

LI Wanying, HUANG Ruizhang, DING Zhiyuan, CHEN Yanping, XU Liyang

Journal of Computer Applications 2018, 38 (11): 3127-3131. DOI: 10.11772/j.issn.1001-9081.2018041357

Abstract （912）

PDF （970KB）（484）

Save

Traditional multi-dimensional text clustering generally extracts features from text contents, but seldom considers the interaction information between users and text data, such as likes, forwards, reviews, concerns, references, etc. Moreover, the traditional multi-dimension text clustering mainly integrates linearly multiple spatial dimensions and fails to consider the relationship between attributes in each dimension. In order to effectively use text-related user behavior information, a Multi-dimensional Text Clustering with User Behavior Characteristics (MTCUBC) was proposed. According to the principle that the similarity between texts should be consistent in different spaces, the similarity was adjusted by using the user behavior information as the constraints of the text content clustering, and then the distance between the texts was improved by the metric learning method, so that the clustering effect was improved. Extensive experiments conduct and verify that the proposed MTCUBC model is effective, and the results present obvious advantages in high-dimensional sparse data compared to linearly combined multi-dimensional clustering.

Reference | Related Articles | Metrics

Select

Multi-source text topic mining model based on Dirichlet multinomial allocation model

XU Liyang, HUANG Ruizhang, CHEN Yanping, QIAN Zhisen, LI Wanying

Journal of Computer Applications 2018, 38 (11): 3094-3099. DOI: 10.11772/j.issn.1001-9081.2018041359

Abstract （420）

PDF （1100KB）（461）

Save

With the rapid increase of text data sources, topic mining for multi-source text data becomes the research focus of text mining. Since the traditional topic model is mainly oriented to single-source, there are many limitations to directly apply to multi-source. Therefore, a topic model for multi-source based on Dirichlet Multinomial Allocation model (DMA) was proposed considering the difference between sources of topic word-distribution and the nonparametric clustering quality of DMA, namely MSDMA (Multi-Source Dirichlet Multinomial Allocation). The main contributions of the proposed model are as follows:1) it takes into account the characteristics of each source itself when modeling the topic, and can learn the source-specific word distributions of topic k; 2) it can improve the topic discovery performance of high noise and low information through knowledge sharing; 3) it can automatically learn the number of topics within each source without the need for human pre-given. The experimental results in the simulated data set and two real datasets indicate that the proposed model can extract topic information more effectively and efficiently than the state-of-the-art topic models.

Reference | Related Articles | Metrics